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ABSTRACT 

Three different nonparametric tests for scale — the 
Siegel-Tukey (S-T) , the Mood (M) , and the Normal Scores (NS) — are 
compared in order to contrast varying methods of scale test 
development and usage. Procedures for developing the three scale 
tests are discussed, and two examples of the use of each test in 
solving the same problem are given. From the results obtained in the 
two examples, it is apparent that all the three tests tend to give 
equivalent answers. (DB) 
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Behavioral scientists frequently find it of interest to deter- 
mine whether random samples have been drawn from populations having 
che same variance. For the two sample problem such a hypothesis 
regarding the equality of variance (scale) is tested by forming a 
ratio between the two sample variances and then checking for signif- 
icance using the F distribution. Most researchers fail to realize 
that this test of hypothesis is only appropriate when the underlying 
distribution of scores is approximately normally distributed. Where- 
as z and t tests are robust to violation of the normality assumption, 
especially when N is large, X and F tests are extremely sensitive 
to nonnormality of sample data. When this normality assumption 
cannot be met, the researcher is forced to search out a nonparametric 
test to investigate the hypothesis of interest. 

A number of nonparametric tests have been proposed as possible 
alternatives to the parametric F-test. Under specified conditions 
each in its own way would be considered to be a "good" test. Since 
nonparametric tests generally require a substitution for the actual 
data, a primary distinction between these tests is the variable. 
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being substituted in place of the original scores. In this paper 
three different nonparametric tests for scale are being compared in 
order to contrast varying methods of development and usage. These 
three tests are the Siegel-Tukey (siegel& Tukev, 1960) test (S-T) , 
the Mood (Mood, 1954) test (M) and the NCrmal Scores (Capon, 1961) 
test (NS) . There are a number of asymptotically equivalent forms 
of the normal scores test for scale. The test presented in this 
paper will use expected normal order statistics in the test develop- 
ment . 

Procedures for Developing Scale Tests 

Let X^, X^, ..., X n be an independent random sample from a 
distribution P and Y 2 , ..., Y be an independent random sample 
from a distribution G. Combining the n+m=N scores of the two samples 
and ranking them in ascending order produces the array of ordered 

1*2 ; E 

scores V < V <... '< v . If ties exist, break them at random. 

The hypothesis (Kg) to be tested is that the two distributions, F 
and G, are identical with respect to scale. The alternative hypothe 
sis (H^) is that the dispersion of scores is different for F and G. 
Siegel - Tukey Test 

■•This test replaces the pooled data from the two samples vrith a 
reordering of the ranks (i) from 1 to N. To illustrate the ranking 
procedure, note the following table when N is assumed to be an even 
number. 

Ordered Score V 1 V 2 V 3 V 4 ... V 11 / 2 . . . V N ” 3 V N “ 2 V N_1 V 11 

Rank Replacement 1 4 5 8 ... N ...7 6 3 2 

At the left end of the ordered set of scores, V 3 " is assigned, 

z 
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a rank of 1. The test development now requires a move to the ex- 
treme right end of the ordered scores where V 1 * is replaced by the 

rank 2 and V 1 '*"" oy the rank 3. Mow move back to the left and sub- 

2 3 

stitute the ranks 4 for V and 5 for V . Operating in pairs, this 
process is repeated until all ordered scores are replaced by their 
appropriate ranks. If N is an odd number, throw out the middle 
score. This will enable the adjacent ranks to sum to the same num- 
ber and thus achieve a desired symmetry to the test. If there is 

no difference in scale between the populations from which the two 

\ 

samples are drawn, then the sum of the ranks associated with each 

sample should be approximately equal. On the other hand, if the 

score spread is not homogeneous for the two groups, then the rank 

sum of the sample with the greatest spread will be significantly 

smaller than the rank sum of the more compressed sample. 

Using an indicator variable, Z^, let Z^ * 1 if the i replace 

th 

ment score is associated with the X sample and Z^ ■ 0 if the i 
score is tagged to the Y sample. This indicator variable is useful 

in setting up the test statistic which is defined as 



o 

FRK 



II 

S-T * £ iZ. 
i=l 1 

The null distribution of the S-T test is exactly the same as that 
of the Wilcoxon test. Thus, Wilcoxon tables (Owen, 1962) can be 
used to determine the significance of S-T for N<20. Equivalent 
tables have been developed by Siegel and Tukey (1960) . For N$20 
the distribution of S-T approximates a normal distribution with 



E(S-T)* n (N+l)/2 and Var (S-T) * nm (II+D/12. The test statistic 

i 

becomes 

Z * (S-T)- E (S-T) 
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Mood Test 

The rank procedure developed by Mood is completely analogous 

to the parametric F-test. Replace the scores in the pooled sample 

by their corresponding ranks. Knowing that the mean of a set of 

ranks from 1 to N is (N+l)/2, determine the sum of squared rank 

deviations about this mean for the X sample. This needs to be done 

only for the X sample since; when dealing with ranked data* the sum 

of squared deviations for sample X plus the sum of squared deviation: 

2 

for sample Y adds to the constant N(N -1)/12. When the samplesare of 
unequal size, it is customary to. select the smaller of the two 
samples for purposes of analysis. 

The indicator variable, Z^, can once again be used to develop 
the test statistic. Let Z^ * 1 if the rank score (i) is associated 
with the X sample and Z^ » 0 otherwise. The test statistic for N<20 
is 



A large value of M implies that the variability of the X sample is 
significantly greater than the variability of the Y sample. For 
small values of M, one draws the opposite conclusion. A table of 
critical values for N<20 is not available at the present time. 
Nevertheless, it is possible to derive critical values for a given 
sample size and alpha level in a short period of time. 

When N is greater than 20, values of 11 approximate a normal 
distribution with E(M) » n(N 2 -l)/JL2 and Var(M) * nm *N+1) (N 2 -4)/180.. 
In this case the test statistic is 
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Normal Score s Test 

This test represents an attempt to reconstruct the F test us- 
ing expected normal order statistics, 2(V X ), in place of the orig- 
inal scores. If V* is the i^ ranked score in the combined sample 
of size N (since values are conditional upon N) , thenE(V 1 ) is the 
expected value of the score in the i* 1 position assuming the score 
has come from a standard normal population. The term E(V X ) acts 
as a distance measure in much the same way that Z scores express 
relative distance in a normal distribution. The data for the two 
samples is first pooled and then ranked from low to high. Ranks 

are then replaced by corresponding E(V x )s. High ranks will have 

• • 
large positive E(V 1 )s and low ranks will have large negative E(V x )s 

Those ranks toward the middle of the distribution will have E(V^)s 

close to zero. The distribution of the E(V*)s for a given N is 

symmetric about zero. Tables of expected normal order statistics 

can be found in Owen's (1962) Handbook of Statistical Tables . 

' Thu normal scores test is completely analogous to Mood's test 
except for the fact that expected normal order statistics replace 
ranks in the test statistic formulation. As was true in the pre- 
viously mentioned tests, it is customary to work with data from 
the smaller of the two samples. Using our indicator variable, Z^, 
let Z^ » 1 if the E(V^) is associated with the X sample and ■ 0 
if the E(V*) is linked to the Y sample. Since the mean of the 
E(V^) s is zero, the test statistic deduces to 

N i 2 

N§ * Z (EOr)PZ. 
i=l 1 

A table of critical values for N<20 is not available, but the prob- 
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ability associated with a given value of ns is easily determined. 

For II >20, use the large sample approximation to the normal 
distribution. The mean and variance of IIS are given by 

N i 2 

E(NS) « n Z (E (V x ) ) ^ and 

ft i=l 

N A 0 

Var (NS) * nm E (ECV 1 )) 4 m (E (IIS) ) 
m=I) i=l n(N-l) 




Example I 

An experimenter wishes to determine whether a special train- 
ing program will influence the abstract reasoning scores of nine 
year old mentally retarded females. To test his theories he se- 
lects 12 (all that were available) nine year old girls who have IQ 
scores recorded between 65 and 75 on the Stanford Binet. He ran- 
domly assigns six of the children to the experimental condition 
and six to the control. After training the experimental group for 
a month, the experimenter then gives both groups an abstract reason- 
ing test. The results are as follows: 

Experimental Control 

19 20 

21 22 

27 23 

30 23 

31 25 

35 26 

He believes that the scores of the group receiving special train- 
ing will have a greater dispersion than those of the control group. 
Is he justified in making this conjecture? Let the probability of 
a Type I error be 0.05 or less. Data pertinent for analyzing this 
problem by the procedures introduced previously are presented in 

6 
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Table 1. 

•(Insert Table I -hers^) 

This experiment will be analyzed using the parametric F test, 
the S-T test, the II test, and the NS test. The hypothesis under 
test (Hq) is that the dispersion (scale) is the same for both the 
experimental and control groups. The alternate hypothesis proposed 
is that score variability is greater for the experimental group. 

F Test 

To validly use this test, the distribution of scores must re- 
semble a normal curve. There i.s no way to justify this assumption 
in Example I. Nevertheless, the F will be computed for comparison 
purposes. 

P - S? = 37.77 = 8.26 

, -| T37 

s c 

If the test is conducted at the 0.05 level, the decision rule would 

be to reject II if F > F(.95) * 5.05. Since F = 3.26, the hvpo- 
0 ~ £,5 

thesis (H 0 ) is rejected. The variance of the scores in the exper- 
imental group is significantly larger than the variance of the 
control group scores. 

Siegel-Tukey Test 

Using the rani: reordered totals from Table I and assuming 
* 1 for the experimental group and Z^ = 0 for the control group, 
the test statistic is 

N 

S-T ■ l iZ. =24 
i=l 1 

Consulting tables in Owen's (1962) Handbook , a value of S-T 24 
would occur less than 1% of the time. Therefore, the hypothesis 
(Hq) is rejected at the 0.05 level. 

7 



- 0 - 



Mood Test 

Performing the analysis on the experimental group scores, the 
test statistic is 

N 2 

II - 2 (i - N+irz. = 111.5 

i=i T 1 

For the 11 test there are no critical value tables to determine 
whether or not Hg is to be rejected. Therefore, the exact proba- 
bility of occurrence must be computed for a value of M greater 
than or equal to 111.5. 

The total number of ways of dividing 12 subjects into two 
groups of six each is Cg = 924. Working exclusively with the ex- 
perimental group, the sura of six squared deviations about the mean 
greater than or equal to 111.5 can occur in exactly 10 ways. Prob- 
ability statements regarding possible values of II in the upper tail 
of the distribution are as follows: 

P(H > 125.5) = 1/924 = 0.001 
^ P(M > 119.5) = 5/924 = 0.005 

P(M > 113.5) = 6/924 = 0.006 
/ P(M > 111.5) = 10/924= 0.011 

Since the probability of obtaining a value of M >_ 111.5 is less 

' i 

than 0.05, reject Hq 
normal Scores Test 

For this test the test statistic is 

i 2 

NS = E (E(V )) Z. = 8.099 
i-1 1 

As was true in the case of the II test, no critical value tables 
exist for NS when N is less than 20. The exact probability of 

i 

obtaining an NS value greater than or equal to 8.099 is 10/924 
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= 0.011. This is found in exactly the sane nanner that the sig- 
nificance level was determined for the M test. Once again II q is 
rejected. 



Example II 

Fifty first grade boys known to have a low expectation of 
success on intellectual tasks and a high anxiety about performance 
in school were randomly assigned to either an arousal or a non- 
arousal condition for purposes of experimentation. The arousal 
group was verbally encouraged to try exceedingly hard to accom- 
plish a specified task. The non-arousal group was told not to 



worry about their performance on the task, sinply try to have a 
good time. The dependent variable of interest was the amount of 



time (in sec.) they would continue to attempt to solve a difficult 
puzzle. The results were as follows : 



Arousal 

139 360 295 360 335 
130 181 91 182 203 
153 360 155 225 71 
124 38 36 203 294 
175 360 360 45 189 



■ V Ion- Arousal 

360 49 140 120 162 

131 129 249 38 44 

32 195 47 138 65 

287 54 133 62 220 

131 118 93 131 90 



The experimenter felt that score variability would be greater 
for the arousal group than the non-arousal group. Test the hypothe- 
sis that there is no difference in scale between the two groups. 

Let a = 0.05. Data necessary for calculating the large sample 
approximations to the normal distribution for the scale tests of 
interest are presented in Table II. 

* (Insert Table II here] 
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The experimenter is hypothesising a directional alternate 
hypothesis. Therefore, all tests with the exception of the S-T test 
vrill be performed with a located in the upper tail of the distribu- 
tion. In the case of the S-T test, H Q will be rejected for large 
negative values of the test statistic. 

F Test 

F - S* = 12096.42 = 1.88 

- 0 6430.08 

S NA 



The hypothesis (H Q ) states that there is no significance difference 

between the variances of the two populations from which the samples 

are drawn. Reject II if F > F(.95) = 1.98. Since F=1.88 is less 

24,24 

than 1.98, fail to reject K . The experimenter's conjecture is not 
borne out. The arousal condition does not produce a greater 
variation among tine scores than the non - arousal condition. 

S iege 1-Tukey Test 

Since If is greater than 20, the normal approximation is 

appropriate for testina it . 

o 

K 

S-T =.E- i 2. = 585 = the sum of the ranks 
’ " i=l for the arousal group. 

E (S-T) = n (N+l) = 25 (51) = 637.5 

"TT T 

Var (S-T) = nm(N+l) = 25(25) (51) = 2656.25 

TT I? 

The test statistic is 

Z = (S-T) - E (S-T) = 535-637.5 * - 1.02 

V 'v ar(S-,') — \^nr~ 
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For the S-T test Hq will be rejected in favor of the arousal group 
when Z is a negative value less than - 1.645. Since Z * -1.02, 

Hq is not rejected. 

Mood Test 

Replace each time score of the combined sample by its rank. 



N 2 
M * E (i - N+l) *Z. 

i*l TT~ J 



5936.25 



* the sum of the squared rank deviations about the mean 



for the arousal group. 

E(I-1) - n(N 2 -l) » 25 (2499)» 5206.25 

-TT TT 

Var (li) * ran' (N+l) (N 2 -4) ■ 25 (25) (51) (24961 

T5T5 nryu — 

* 442000 



The test statistic is 

Z * M-E (II) - 5936.25 - 5206.25 
\/VarW v/4~42(TO0 — 

- 1.10 

The decision rule is to reject H n when Z > Z * 1.645. Once again 

0 “ .95 

Hq is not rejected. The time score dispersion is not statistically 
different for the arousal and non-arousal populations. 

Normal Scores Test 

The procedures are identical to the t! test except that expected 
normal order statistics replace ranks. 

NS - 1 (E(V*)) 2 Z. * 29.625 * the sum of the 

i**l x 

squared expected normal 

order statistics for the arousal group. 
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E(NS) = n Z (E(V X )) 2 a 25 (47.434) ■ 23.72 

W i=l 57 

N i a -> 

Var (NS) » nm E (E(V X )) -m (E(NS)) 

nTw- 1) i-1 nTN-1) 

* 25(25) (119.56) - 25 (23. 72) 2 « 19.02 

5U1W ”25X49) 

Using the large sample approximation, the test statistic is 
Z = NS -E (NS ) = 29.625-23.72 = 1.35 
\? Var (NS) l/","rO 

Since Z * 1,35 is less than 1.645, fail to reject H q . 

Conclusion 

It is obvious from the results generated in the two examples 
that all tests tend to give equivalent answers. At least we can 
say the conclusions are consistent regardless of which test is 
used to test H q . Obviously this is largely a function of the 
distribution of data for the two examples. The agreement will 
not always be as consistent. Klotz (1961) has compared the 
relative efficiency of the S-T, II and NS tests for a specified 
number of distributions. For scores drawn from distributions 
with sharp tails (exponential, rectangular, etc # ) the NS test 
is preferred to S-T and is equally as effective as M, When the 
distribution of scores has heavy tails (Cauchy, etc ,) , use the 
S-T test for testing equality of scale. Naturally when data is 
normally distributed the F test is most powerful. Assuming 
normality of scores, the asymptotic relative efficiency of S-T 
to F is 0.61, of M to F is 0.76 and of NS to F is 1.0. Bradley 
(1968), Conover (1971), and Gibbons (1971) provide excellent 
coverage and development of the more commonly used nonparametric 
tests for scale, 
o 

ERIC 
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Table I ? 

Abstract Reasoning Data for the F, Siegel-Tukey, Hood 

and normal Scores Tests 



Abstract 


Rank 


S-T 


(i-N+1) 2 


EfV 1 ) 


(EtV 1 )! 


Reasoning 


(i) 


Rank 


2 






Exp. 












19 


1 


1 


30.25 


-1.629 


2.654 


21 


3 


5 


12.25 


-0.793 


0.629 


27 


9 


7 


6.25 


0.537 


0.288 


30 


10 


6 


12.25 


0.793 


0.629 


31 


11 


3 


20.25 


1.116 


1.245 


35 


12 


2 


30.25 


1.629 


2.654 


Total 


46 


24 


111.50 


1.653 


8.099 


Cont . 












20 


2 


4 


20.25 


-1.116 


1.245 


22 


4 


8 


6.25 


-0.537 


0.288 


23 


5 


9 


2.25 


-0.312 


0.097 


23 


6 


10 


.25 


-0.103 


0.011 


25 


7 


11 


.25 


0.103 


0.011 


26 


8 


12 


2.25 


0.312 


0.097 


Total 


32 


54 


31.50 


-1.653 


1.749 
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Table II 



Arousal - Hon- Arousal Data for the F, Siegel - Tukev, Hood 

and Normal Scores Tests 

Arousal 



Time 

(sec) 


Rank 

(i) 


S-T 

Rafik 


(i-N+1) 2 

2 


E(V X ) 


(ECV 1 )) 2 


(EC/ 1 )) 


139 


26 


50 


0.25 


0.025 


0.001 


0.000 


130 


20 


40 


30.25 


-0.278 


0.077 


0.006 


153 


23 


46 


6.25 


0.125 


0.016 


0.000 


124 


18 


36 


56.25 


-0.384 


0.147 


0.022 


175 


31 


39 


30.25 


0.278 


0.077 


0.006 


360 


50 


2 


600.25 


2.249 


5.058 


25.583 


181 


32 


38 


42.25 


0.330 


0.109 


0.012 


360 


49 


3 


552.25 


1.355 


3.441 


11.341 


38 


3 


5 


506.25 


-1.629 


2.654 


7.042 


360 


47 


7 


462.25 


1.464 


2.143 


4.594 


295 


43 


15 


306.25 


1.030 


1.061 


1.126 


91 


14 


28 


132.25 


-0.610 


0.372 


0.138 


155 


29 


43 


12.25 


0.176 


0.031 


0.001 


36 


1 


1 


600.25 


-2.249 


5.058 


25.583 


360 


46 


id 


420.25 


1.331 


1.772 


3.138 


360 


45 


11 


380.25 


1.219 


1.483 


2.208 


182 


33 


35 


56.25 


0.384 


0.147 


0.022 


225 


39 


23 


182.25 


0.735 


0.540 


0.292 


203 


37 


27 


132.25 


0.610 


0.372 


0.138 


45 


5 


9 


420.25 


-1.331 


1.772 


3.138 


335 


44 


14 


342.25 


1.120 


1.254 


1.574 


203 


36 


30 


110.25 


0.551 


0.304 


0.092 


71 


11 


21 


210.25 


-0.802 


0.643 


0.414 


294 


42 


18 


272.25 


0.949 


0.901 


0.811 


189 


34 


34 


72.25 


0.438 


0.192 


0.037 


Totals 


763 


585 


5936.25 


7.586 


29.625 


87.818 




Table II (cont.) 



Time 


Rank 


S-T 


(sec.) 


(i) 


Rank 


360 


48 


6 


131 


23 


45 


82 


12 


24 


287 


41 


19 


131 


22 


44 


49 


7 


13 


129 


19 


37 


195 


35 


31 


54 


8 


16 


118 


16 


32 


140 


27 


47 


249 


40 


22 


47 


6 


12 


133 


24 


48 


98 


15 


29 


128 


17 


33 


38 


2 


4 


138 


25 


49 


62 


9 


17 


131 


21 


41 '' 


162 


30 


42 


44 


4 


8 


65 


10 


20 


220 


38 


26 


90 


13 


25 


Totals 


512 


690 



Non-Arousal 


(i-N+1) 2 


EtV 1 ) 


~T~ 


506.25 


1.629 


6.25 


-0.125 


182.25 


-0.735 


240.25 


-0.873 


12.25 


-0.176 


342.25 


-1.120 


42.25 


-0.330 


90.25 


0.494 


306.25 


-1.030 


90.25 


^0 .494 


2.25 


0.075 


210.25 


0.802 


380.25 


-1.219 


2.25 


-0.075 


110.25 


-0.551 


72.25 


-0.438 


552.25 


-1.055 


0.25 


-0.025 


272.25 


-0.949 


20.25 


-0.227 


20.25 


0.227 


462.25 


-1.464 


240.25 


-0.873 


156.25 


0.671 


156.25 


-0.671 


4476.25 


-7.586 



(ECV 1 )) 2 


(E(V 1 ) ) 


2.654 


7.042 


0.016 


0.000 


0.540 


0.292 


0.762 


0.581 


0.031 


0.001 


1.254 


1.574 


0.109 


0.012 


0.244 


0.060 


1.061 


1.126 


0.244 


0.060 


0.006 


0.000 


0.643 


0.414 


1.486 


2.208 


0.006 


0.000 


0.304 


0.092 


0.192 


0.037 


3.441 


11.341 


0.006 


0.000 


0.901 


0.811 


0.052 


0.003 


0.052 


0.003 


2.143 


4.594 


0.762 


0.581 


0.450 


0.203 


0.450 


0.203 


17.809 


31.738 



